27 research outputs found

    Cross-domain Voice Activity Detection with Self-Supervised Representations

    Full text link
    Voice Activity Detection (VAD) aims at detecting speech segments on an audio signal, which is a necessary first step for many today's speech based applications. Current state-of-the-art methods focus on training a neural network exploiting features directly contained in the acoustics, such as Mel Filter Banks (MFBs). Such methods therefore require an extra normalisation step to adapt to a new domain where the acoustics is impacted, which can be simply due to a change of speaker, microphone, or environment. In addition, this normalisation step is usually a rather rudimentary method that has certain limitations, such as being highly susceptible to the amount of data available for the new domain. Here, we exploited the crowd-sourced Common Voice (CV) corpus to show that representations based on Self-Supervised Learning (SSL) can adapt well to different domains, because they are computed with contextualised representations of speech across multiple domains. SSL representations also achieve better results than systems based on hand-crafted representations (MFBs), and off-the-shelf VADs, with significant improvement in cross-domain settings

    Textual properties and task based evaluation : investigating the role of surface properties, structure and content

    Get PDF
    This paper investigates the relationship between the results of an extrinsic, task-based evaluation of an NLG system and various metrics measuring both surface and deep semantic textual properties, including relevance. The latter rely heavily on domain knowledge. We show that they correlate systematically with some measures of performance. The core argument of this paper is that more domain knowledge-based metrics shed more light on the relationship between deep semantic properties of a text and task performance.peer-reviewe

    If it may have happened before, it happened, but not necessarily before

    Get PDF
    Temporal uncertainty in raw data can impede the inference of temporal and causal relationships between events and compromise the output of data-to-text NLG systems. In this paper, we introduce a framework to reason with and represent temporal uncertainty from the raw data to the generated text, in order to provide a faithful picture to the user of a particular situation. The model is grounded in experimental data from multiple languages, shedding light on the generality of the approach.peer-reviewe

    The Importance of Narrative and Other Lessons from an Evaluation of an NLG System that Summarises Clinical Data

    Get PDF
    This research was funded by the UK Engineering and Physical Sciences Research Council, under grant EP/D049520/1.Publisher PD

    Towards a possibility-theoretic approach to uncertainty in medical data interpretation for text generation

    Get PDF
    Many real-world applications that reason about events obtained from raw data must deal with the problem of temporal uncertainty, which arises due to error or inaccuracy in data. Uncertainty also compromises reasoning where relationships between events need to be inferred. This paper discusses an approach to dealing with uncertainty in temporal and causal relations using Possibility Theory, focusing on a family of medical decision support systems that aim to generate textual summaries from raw patient data in a Neonatal Intensive Care Unit. We describe a framework to capture temporal uncertainty and to express it in generated texts by mean of linguistic modifiers. These modifiers have been chosen based on a human experiment testing the association between subjective certainty about a proposition and the participants’ way of verbalising it.peer-reviewe

    Text content and task performance in the evaluation of a natural language generation system

    Get PDF
    An important question in the evaluation of Natural Language Generation systems concerns the relationship between textual characteristics and task performance. If the results of task-based evaluation can be correlated to properties of the text, there are better prospects for improving the system. The present paper investigates this relationship by focusing on the outcomes of a task-based evaluation of a system that generates summaries of patient data, attempting to correlate these with the results of an analysis of the system’s texts, compared to a set of gold standard human-authored summaries.peer-reviewe

    The importance of narrative and other lessons from an evaluation of an NLG system that summarises clinical data

    Get PDF
    The BABYTALK BT-45 system generates textual summaries of clinical data about babies in a neonatal intensive care unit. A recent task-based evaluation of the system suggested that these summaries are useful, but not as effective as they could be. In this paper we present a qualitative analysis of problems that the evaluation highlighted in BT-45 texts. Many of these problems are due to the fact that BT-45 does not generate good narrative texts; this is a topic which has not previously received much attention from the NLG research community, but seems to be quite important for creating good data-to-text systems.peer-reviewe

    The role of graduality for referring expression generation in visual scenes

    Get PDF
    Referring Expression Generation (reg) algorithms, a core component of systems that generate text from non-linguistic data, seek to identify domain objects using natural language descriptions. While reg has often been applied to visual domains, very few approaches deal with the problem of fuzziness and gradation. This paper discusses these problems and how they can be accommodated to achieve a more realistic view of the task of referring to objects in visual scenes.peer-reviewe
    corecore